!pip install pandas
!pip install seaborn
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
Requirement already satisfied: pandas in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (1.5.0)
Requirement already satisfied: numpy>=1.20.3 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from pandas) (1.23.2)
Requirement already satisfied: python-dateutil>=2.8.1 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from pandas) (2022.2.1)
Requirement already satisfied: six>=1.5 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Collecting seaborn
Downloading seaborn-0.12.0-py3-none-any.whl (285 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 285.1/285.1 kB 2.0 MB/s eta 0:00:00 MB/s eta 0:00:01:01
Requirement already satisfied: pandas>=0.25 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from seaborn) (1.5.0)
Requirement already satisfied: matplotlib>=3.1 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from seaborn) (3.6.0)
Requirement already satisfied: numpy>=1.17 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from seaborn) (1.23.2)
Requirement already satisfied: pyparsing>=2.2.1 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from matplotlib>=3.1->seaborn) (3.0.9)
Requirement already satisfied: pillow>=6.2.0 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from matplotlib>=3.1->seaborn) (9.2.0)
Requirement already satisfied: fonttools>=4.22.0 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from matplotlib>=3.1->seaborn) (4.37.3)
Requirement already satisfied: contourpy>=1.0.1 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from matplotlib>=3.1->seaborn) (1.0.5)
Requirement already satisfied: python-dateutil>=2.7 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from matplotlib>=3.1->seaborn) (2.8.2)
Requirement already satisfied: cycler>=0.10 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from matplotlib>=3.1->seaborn) (0.11.0)
Requirement already satisfied: packaging>=20.0 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from matplotlib>=3.1->seaborn) (21.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from matplotlib>=3.1->seaborn) (1.4.4)
Requirement already satisfied: pytz>=2020.1 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from pandas>=0.25->seaborn) (2022.2.1)
Requirement already satisfied: six>=1.5 in /home/hytruongson/anaconda3/envs/circuitnet/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib>=3.1->seaborn) (1.16.0)
Installing collected packages: seaborn
Successfully installed seaborn-0.12.0
Here is a data set of sales figures from different stores.
data = pd.read_csv('sales.csv')
data
| net_sales | sq_ft | inventory | advertising | district_size | competing_stores | |
|---|---|---|---|---|---|---|
| 0 | 231.0 | 3.0 | 294 | 8.2 | 8.200000 | 11 |
| 1 | 156.0 | 2.2 | 232 | 6.9 | 4.100000 | 12 |
| 2 | 10.0 | 0.5 | 149 | 3.0 | 4.300000 | 15 |
| 3 | 519.0 | 5.5 | 600 | 12.0 | 16.100000 | 1 |
| 4 | 437.0 | 4.4 | 567 | 10.6 | 14.100000 | 5 |
| 5 | 487.0 | 4.8 | 571 | 11.8 | 12.700000 | 4 |
| 6 | 299.0 | 3.1 | 512 | 8.1 | 10.100000 | 10 |
| 7 | 195.0 | 2.5 | 347 | 7.7 | 8.400000 | 12 |
| 8 | 20.0 | 1.2 | 212 | 3.3 | 2.100000 | 15 |
| 9 | 68.0 | 0.6 | 102 | 4.9 | 4.700000 | 8 |
| 10 | 570.0 | 5.4 | 788 | 17.4 | 12.300000 | 1 |
| 11 | 428.0 | 4.2 | 577 | 10.5 | 14.000000 | 7 |
| 12 | 464.0 | 4.7 | 535 | 11.3 | 15.000000 | 3 |
| 13 | 15.0 | 0.6 | 163 | 2.5 | 2.500000 | 14 |
| 14 | 65.0 | 1.2 | 168 | 4.7 | 3.300000 | 11 |
| 15 | 98.0 | 1.6 | 151 | 4.6 | 2.700000 | 10 |
| 16 | 398.0 | 4.3 | 342 | 5.5 | 16.000000 | 4 |
| 17 | 161.0 | 2.6 | 196 | 7.2 | 6.300000 | 13 |
| 18 | 397.0 | 3.8 | 453 | 10.4 | 13.900000 | 7 |
| 19 | 497.0 | 5.3 | 518 | 11.5 | 16.299999 | 1 |
| 20 | 528.0 | 5.6 | 615 | 12.3 | 16.000000 | 0 |
| 21 | 99.0 | 0.8 | 278 | 2.8 | 6.500000 | 14 |
| 22 | 0.5 | 1.1 | 142 | 3.1 | 1.600000 | 12 |
| 23 | 347.0 | 3.6 | 461 | 9.6 | 11.300000 | 6 |
| 24 | 341.0 | 3.5 | 382 | 9.8 | 11.500000 | 5 |
| 25 | 507.0 | 5.1 | 590 | 12.0 | 15.700000 | 0 |
| 26 | 400.0 | 8.6 | 517 | 7.0 | 12.000000 | 8 |
Before we perform multiple linear regression, let's first just perform simple linear regression. We'll try and use square footage to predict net sales; our prediction rule will be
$$ \text{predicted net sales} = w_0 + w_1 (\text{square feet}) $$px.scatter(data, x='sq_ft', y='net_sales', title='Net Sales vs. Square Feet')
It seems like $w_1^*$, the optimal slope, should be positive. To find $w_0^*$ and $w_1^*$, we'll solve the normal equations.
def solve_normal_equations(X, y):
'''Returns the optimal parameter vector, w*, given a design matrix X and observation vector y.'''
return np.linalg.solve(X.T @ X, X.T @ y)
data['1'] = 1
X_one_feature_model = data[['1', 'sq_ft']]
X_one_feature_model.values
array([[1. , 3. ],
[1. , 2.20000005],
[1. , 0.5 ],
[1. , 5.5 ],
[1. , 4.4000001 ],
[1. , 4.80000019],
[1. , 3.0999999 ],
[1. , 2.5 ],
[1. , 1.20000005],
[1. , 0.60000002],
[1. , 5.4000001 ],
[1. , 4.19999981],
[1. , 4.69999981],
[1. , 0.60000002],
[1. , 1.20000005],
[1. , 1.60000002],
[1. , 4.30000019],
[1. , 2.5999999 ],
[1. , 3.79999995],
[1. , 5.30000019],
[1. , 5.5999999 ],
[1. , 0.80000001],
[1. , 1.10000002],
[1. , 3.5999999 ],
[1. , 3.5 ],
[1. , 5.0999999 ],
[1. , 8.6 ]])
y = data['net_sales']
w_one_feature_model = solve_normal_equations(X_one_feature_model, y)
w_one_feature_model
array([ 2.57700629, 85.38887328])
This is telling us that the best-fitting line to this dataset is
$$\text{predicted net sales} = 2.577 + 85.389 (\text{square feet})$$To get predictions for all observations in my dataset:
X_one_feature_model @ w_one_feature_model
0 258.743626 1 190.432532 2 45.271443 3 472.215809 4 378.288057 5 412.443614 6 267.282505 7 216.049189 8 105.043658 9 53.810332 10 463.676930 11 361.210258 12 403.904694 13 53.810332 14 105.043658 15 139.199206 16 369.749178 17 224.588069 18 327.054721 19 455.138051 20 480.754689 21 70.888106 22 96.504769 23 309.976942 24 301.438063 25 438.060252 26 736.921317 dtype: float64
Let's draw a plot of our prediction rule.
px.scatter(data, x='sq_ft', y='net_sales', title='Net Sales vs. Square Feet')
x_range = np.linspace(0, 10)
fig = go.Figure()
fig.add_trace(go.Scatter(x = data['sq_ft'], y = y, mode = 'markers', name = 'actual'))
fig.add_trace(go.Scatter(x = x_range,
y = w_one_feature_model[0] + w_one_feature_model[1] * x_range,
name = 'linear prediction rule',
line=dict(color='red')))
fig.update_layout(xaxis_title = 'Square Feet', yaxis_title = 'Net Sales')
It's also worth calculating the mean squared error of this prediction rule, so that we can compare it to our later prediction rules.
def mean_squared_error(X, y, w):
return np.mean(np.sum((y - X @ w)**2))
Let's now try to predict net sales from two variables: the square footage (size) of the store, and the number of competing stores in the area. Our model will be:
$$ \text{predicted net sales} = w_0 + w_1 (\text{square feet}) + w_2(\text{competitors}) $$Suppose $w_0^*$, $w_1^*$, and $w_2^*$ are our prediction rule's optimal parameters. Do you expect $w_1^*$ to be positive or negative? What about $w_2^*$?
px.scatter(data, x='sq_ft', y='net_sales', title='Net Sales vs. Square Feet')
px.scatter(data, x='competing_stores', y='net_sales', title='Net Sales vs. Competing Stores')
Looking at separate scatter plots only tells part of the story. Let's look at a 3D scatter plot, with one axis for square footage, one axis for competing stores, and one axis for net sales.
fig = go.Figure()
fig.add_trace(go.Scatter3d(x=data['sq_ft'],
y=data['competing_stores'],
z=data['net_sales'], mode='markers'))
fig.update_layout(scene = dict(
xaxis_title = 'Square Feet',
yaxis_title = 'Competing Stores',
zaxis_title = 'Net Sales'))
Our goal is to find the best fitting plane to this set of points.
At the start of this notebook, we fit a prediction rule with a single feature, square feet, and got that the weight of that feature was $w_1^* = 85.389$.
We are about to fit a prediction rule with two features, square feet and competing stores.
Question: Is the weight of the square feet feature, $w_1^*$, for this new prediction rule guaranteed to be equal to 85.389?
A. Yes
B. No
Answer: No.
Our design matrix is:
$$ \begin{pmatrix} 1 & s_1 & c_1\\ 1 & s_2 & c_2\\ \vdots & \vdots & \vdots\\ 1 & s_n & c_n \end{pmatrix} $$where $s_i$ is the size of the $i$th store, and $c_n$ is the number of competitors. In code:
X_two_feature_model = data[['1', 'sq_ft', 'competing_stores']].values
X_two_feature_model
array([[ 1. , 3. , 11. ],
[ 1. , 2.20000005, 12. ],
[ 1. , 0.5 , 15. ],
[ 1. , 5.5 , 1. ],
[ 1. , 4.4000001 , 5. ],
[ 1. , 4.80000019, 4. ],
[ 1. , 3.0999999 , 10. ],
[ 1. , 2.5 , 12. ],
[ 1. , 1.20000005, 15. ],
[ 1. , 0.60000002, 8. ],
[ 1. , 5.4000001 , 1. ],
[ 1. , 4.19999981, 7. ],
[ 1. , 4.69999981, 3. ],
[ 1. , 0.60000002, 14. ],
[ 1. , 1.20000005, 11. ],
[ 1. , 1.60000002, 10. ],
[ 1. , 4.30000019, 4. ],
[ 1. , 2.5999999 , 13. ],
[ 1. , 3.79999995, 7. ],
[ 1. , 5.30000019, 1. ],
[ 1. , 5.5999999 , 0. ],
[ 1. , 0.80000001, 14. ],
[ 1. , 1.10000002, 12. ],
[ 1. , 3.5999999 , 6. ],
[ 1. , 3.5 , 5. ],
[ 1. , 5.0999999 , 0. ],
[ 1. , 8.6 , 8. ]])
Using the function solve_normal_equations that we already built:
w_two_feature_model = solve_normal_equations(X_two_feature_model, y)
w_two_feature_model
array([303.49073761, 45.15092186, -21.5851804 ])
This is telling us that the best-fitting plane to this dataset is
$$\text{predicted net sales} = 303.491 + 45.151 (\text{square feet}) - 21.585 (\text{competing stores})$$Note that the weight of $\text{square feet}$ in this prediction rule is different than the weight of $\text{square feet}$ in the prediction rule that only had one feature!
XX, YY = np.mgrid[1:10:2, 0:16:2]
Z = w_two_feature_model[0] + w_two_feature_model[1] * XX + w_two_feature_model[2] * YY
plane = go.Surface(x=XX, y=YY, z=Z)
fig = go.Figure(data=[plane])
fig.add_trace(go.Scatter3d(x=data['sq_ft'],
y=data['competing_stores'],
z=data['net_sales'], mode='markers', marker = {'color': '#656DF1'}))
fig.update_layout(scene = dict(
xaxis_title = 'Square Feet',
yaxis_title = 'Competing Stores',
zaxis_title = 'Net Sales'))
As before, let's calculate the MSE:
mean_squared_error(X_two_feature_model, y, w_two_feature_model)
72287.04554905623
Note that this is significantly lower than the MSE of the model with just one feature:
mean_squared_error(X_one_feature_model, y, w_one_feature_model)
192390.8961789433
Let's fit a prediction rule using all of the features.
column_order = ['1', 'sq_ft', 'competing_stores', 'inventory', 'advertising', 'district_size']
X_all_features = data[column_order].values
X_all_features
array([[1.00000000e+00, 3.00000000e+00, 1.10000000e+01, 2.94000000e+02,
8.19999981e+00, 8.19999981e+00],
[1.00000000e+00, 2.20000005e+00, 1.20000000e+01, 2.32000000e+02,
6.90000010e+00, 4.09999990e+00],
[1.00000000e+00, 5.00000000e-01, 1.50000000e+01, 1.49000000e+02,
3.00000000e+00, 4.30000019e+00],
[1.00000000e+00, 5.50000000e+00, 1.00000000e+00, 6.00000000e+02,
1.20000000e+01, 1.61000004e+01],
[1.00000000e+00, 4.40000010e+00, 5.00000000e+00, 5.67000000e+02,
1.06000004e+01, 1.41000004e+01],
[1.00000000e+00, 4.80000019e+00, 4.00000000e+00, 5.71000000e+02,
1.18000002e+01, 1.26999998e+01],
[1.00000000e+00, 3.09999990e+00, 1.00000000e+01, 5.12000000e+02,
8.10000000e+00, 1.01000004e+01],
[1.00000000e+00, 2.50000000e+00, 1.20000000e+01, 3.47000000e+02,
7.69999981e+00, 8.40000000e+00],
[1.00000000e+00, 1.20000005e+00, 1.50000000e+01, 2.12000000e+02,
3.29999995e+00, 2.09999990e+00],
[1.00000000e+00, 6.00000024e-01, 8.00000000e+00, 1.02000000e+02,
4.90000010e+00, 4.69999981e+00],
[1.00000000e+00, 5.40000010e+00, 1.00000000e+00, 7.88000000e+02,
1.73999996e+01, 1.23000002e+01],
[1.00000000e+00, 4.19999981e+00, 7.00000000e+00, 5.77000000e+02,
1.05000000e+01, 1.40000000e+01],
[1.00000000e+00, 4.69999981e+00, 3.00000000e+00, 5.35000000e+02,
1.13000002e+01, 1.50000000e+01],
[1.00000000e+00, 6.00000024e-01, 1.40000000e+01, 1.63000000e+02,
2.50000000e+00, 2.50000000e+00],
[1.00000000e+00, 1.20000005e+00, 1.10000000e+01, 1.68000000e+02,
4.69999981e+00, 3.29999995e+00],
[1.00000000e+00, 1.60000002e+00, 1.00000000e+01, 1.51000000e+02,
4.59999990e+00, 2.70000005e+00],
[1.00000000e+00, 4.30000019e+00, 4.00000000e+00, 3.42000000e+02,
5.50000000e+00, 1.60000000e+01],
[1.00000000e+00, 2.59999990e+00, 1.30000000e+01, 1.96000000e+02,
7.19999981e+00, 6.30000019e+00],
[1.00000000e+00, 3.79999995e+00, 7.00000000e+00, 4.53000000e+02,
1.03999996e+01, 1.38999996e+01],
[1.00000000e+00, 5.30000019e+00, 1.00000000e+00, 5.18000000e+02,
1.15000000e+01, 1.62999992e+01],
[1.00000000e+00, 5.59999990e+00, 0.00000000e+00, 6.15000000e+02,
1.23000002e+01, 1.60000000e+01],
[1.00000000e+00, 8.00000012e-01, 1.40000000e+01, 2.78000000e+02,
2.79999995e+00, 6.50000000e+00],
[1.00000000e+00, 1.10000002e+00, 1.20000000e+01, 1.42000000e+02,
3.09999990e+00, 1.60000002e+00],
[1.00000000e+00, 3.59999990e+00, 6.00000000e+00, 4.61000000e+02,
9.60000000e+00, 1.13000002e+01],
[1.00000000e+00, 3.50000000e+00, 5.00000000e+00, 3.82000000e+02,
9.80000019e+00, 1.15000000e+01],
[1.00000000e+00, 5.09999990e+00, 0.00000000e+00, 5.90000000e+02,
1.20000000e+01, 1.56999998e+01],
[1.00000000e+00, 8.60000000e+00, 8.00000000e+00, 5.17000000e+02,
7.00000000e+00, 1.20000000e+01]])
w_all_features = solve_normal_equations(X_all_features, y)
w_all_features
array([-18.85941416, 16.20157356, -5.31097141, 0.17463515,
11.52626903, 13.5803129 ])
for i, feature in enumerate(column_order):
if feature == '1':
print(f'intercept:\t{w_all_features[0]:0.3f}')
else:
print(f'{feature}:\t{w_all_features[i]:0.3f}')
intercept: -18.859 sq_ft: 16.202 competing_stores: -5.311 inventory: 0.175 advertising: 11.526 district_size: 13.580
The MSE of this model is even lower!
mean_squared_error(X_all_features, y, w_all_features)
6541.410343631843
Note that I can't visualize this prediction rule, since I would need to be able to visualize in 6D, but I can still find this prediction rule's predictions:
X_all_features @ w_all_features
array([228.54132342, 128.77828635, 28.57159462, 526.67763281,
438.55165964, 445.8608775 , 298.19289184, 221.33815906,
24.49589951, 86.4927322 , 568.5255428 , 423.92508128,
468.73640753, 7.73991561, 70.48898947, 70.01098006,
369.96817843, 156.99564616, 393.2790273 , 506.07015112,
538.3281338 , 88.84240462, 17.48878601, 352.21694784,
347.13290225, 518.32948881, 411.92035997])
We should standardize in order to account for the difference in units and scale between the features.
Question: What would happen if I try to standardize the column of all 1s? 🧐
features = data[column_order].iloc[:, 1:].values
standardized_features = (features - features.mean(axis=0)) / features.std(axis=0)
X_standardized = np.column_stack([
np.ones(data.shape[0]),
standardized_features
])
w_standardized = solve_normal_equations(X_standardized, y)
w_standardized
array([286.57407407, 31.97302867, -25.51529781, 32.76054166,
42.69274551, 68.49841225])
for i, feature in enumerate(column_order):
if feature == '1':
print(f'intercept:\t{w_standardized[0]:0.3f}')
else:
print(f'{feature}:\t{w_standardized[i]:0.3f}')
intercept: 286.574 sq_ft: 31.973 competing_stores: -25.515 inventory: 32.761 advertising: 42.693 district_size: 68.498
The district size appears to have the largest effect on the net sales.
mean_squared_error(X_standardized, y, w_standardized)
6541.410343631844
Note that standardizing has no impact on the actual predictions made by our prediction rule, and hence the MSE – it just makes the weights more interpretable.
cars = sns.load_dataset('mpg').dropna()
cars
| mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
| 1 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
| 2 | 18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 70 | usa | plymouth satellite |
| 3 | 16.0 | 8 | 304.0 | 150.0 | 3433 | 12.0 | 70 | usa | amc rebel sst |
| 4 | 17.0 | 8 | 302.0 | 140.0 | 3449 | 10.5 | 70 | usa | ford torino |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 393 | 27.0 | 4 | 140.0 | 86.0 | 2790 | 15.6 | 82 | usa | ford mustang gl |
| 394 | 44.0 | 4 | 97.0 | 52.0 | 2130 | 24.6 | 82 | europe | vw pickup |
| 395 | 32.0 | 4 | 135.0 | 84.0 | 2295 | 11.6 | 82 | usa | dodge rampage |
| 396 | 28.0 | 4 | 120.0 | 79.0 | 2625 | 18.6 | 82 | usa | ford ranger |
| 397 | 31.0 | 4 | 119.0 | 82.0 | 2720 | 19.4 | 82 | usa | chevy s-10 |
392 rows × 9 columns
px.scatter(cars, x='horsepower', y='mpg', title='MPG vs. Horsepower')
A regular linear model here isn't great.
cars['1'] = 1
w_cars_one_feature = solve_normal_equations(cars[['1', 'horsepower']], cars['mpg'])
w_cars_one_feature
array([39.93586102, -0.15784473])
px.scatter(cars, x='horsepower', y='mpg', title='MPG vs. Horsepower')
x_range = np.linspace(40, 220)
fig = go.Figure()
fig.add_trace(go.Scatter(x = cars['horsepower'], y = cars['mpg'], mode = 'markers', name = 'actual'))
fig.add_trace(go.Scatter(x = x_range,
y = w_cars_one_feature[0] + w_cars_one_feature[1] * x_range,
name = 'linear prediction rule',
line=dict(color='red')))
fig.update_layout(xaxis_title = 'Horsepower', yaxis_title = 'MPG')
What if we add $\text{horsepower}^2$ as a feature?
cars['horsepower^2'] = cars['horsepower']**2
cars[['1', 'horsepower', 'horsepower^2']]
| 1 | horsepower | horsepower^2 | |
|---|---|---|---|
| 0 | 1 | 130.0 | 16900.0 |
| 1 | 1 | 165.0 | 27225.0 |
| 2 | 1 | 150.0 | 22500.0 |
| 3 | 1 | 150.0 | 22500.0 |
| 4 | 1 | 140.0 | 19600.0 |
| ... | ... | ... | ... |
| 393 | 1 | 86.0 | 7396.0 |
| 394 | 1 | 52.0 | 2704.0 |
| 395 | 1 | 84.0 | 7056.0 |
| 396 | 1 | 79.0 | 6241.0 |
| 397 | 1 | 82.0 | 6724.0 |
392 rows × 3 columns
w_cars_squared = solve_normal_equations(cars[['1', 'horsepower', 'horsepower^2']], cars['mpg'])
w_cars_squared
array([ 5.69000997e+01, -4.66189630e-01, 1.23053610e-03])
Let's look at the resulting prediction rule.
px.scatter(cars, x='horsepower', y='mpg', title='MPG vs. Horsepower')
fig = go.Figure()
fig.add_trace(go.Scatter(x = cars['horsepower'], y = cars['mpg'], mode = 'markers', name = 'actual'))
fig.add_trace(go.Scatter(x = x_range,
y = w_cars_one_feature[0] + w_cars_one_feature[1] * x_range,
name = 'linear prediction rule',
line=dict(color='red')))
fig.add_trace(go.Scatter(x = np.linspace(40, 220),
y = w_cars_squared[0] + w_cars_squared[1] * x_range + w_cars_squared[2] * x_range**2,
name = 'quadratic prediction rule',
line=dict(color='#F7CF5D', width=5)))
fig.update_layout(xaxis_title = 'Horsepower', yaxis_title = 'MPG')
Note: this is still a linear prediction rule – it's just not linear in terms of horsepower. It is linear in terms of the parameter vector, $\vec{w}$, which means we can still use the normal equations to find $\vec{w}^*$.